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Abstract 

Minimizing the rank of a matrix subject to afRne constraints is a fundamental problem with 
many important applications in machine learning and statistics. In this paper we propose a sim- 
ple and fast algorithm SVP (Singular Value Projection) for rank minimization with affinc con- 
straints (ARMP) and show that SVP recovers the minimum rank solution for affinc constraints 
that satisfy the restricted isometry property. We show robustness of our method to noise with 
a strong geometric convergence rate even for noisy measurements. Our results improve upon 
a recent breakthrough by Recht, Fazel and Parillo [RFP07] and Lee and Breslcr [LB09a] in 
three significant ways: 1) our method (SVP) is significantly simpler to analyze and easier to im- 
plement, 2) we give recovery guarantees under strictly weaker isometry assumptions 3) we give 
geometric convergence guarantees for SVP and, as demonstrated empirically. SVP is significantly 
faster on real- world and synthetic problems. In addition, we address the practically important 
problem of low-rank matrix completion, which can be seen as a special case of ARMP. How- 
ever, the affine constraints defining the matrix-completion problem do not obey the restricted 
isometry property in general. We empirically demonstrate that our algorithm recovers low-rank 
incoherent matrices from an almost optimal number of uniformly sampled entries. We make 
partial progress towards proving exact recovery and provide some intuition for the performance 
of SVP applied to matrix completion by showing a more restricted isometry property. Our algo- 
rithm outperforms existing methods, such as those of [RFP07, CR08, CT09, CCS08, KOM09], 
for ARMP and the matrix-completion problem by an order of magnitude and is also significantly 
more robust to noise. 



*A shorter version of this paper wEis submitted to NIPS 2009 on June 5, 2009. 



1 Introduction 



In this paper we study the general affine rank minimization problem (ARMP), 

min rank{X) s.t A{X) = b, X £ M™^", b £ M^, (ARMP) 

where A is an affine transformation from M"*^" to M.'^. 

The general affine rank minimization problem is of considerable practical interest and many im- 
portant machine learning problems such as matrix completion, low-dimensional metric embedding, 
low-rank kernel learning can be viewed as instances of the above problem. Unfortunately, ARMP 
is NP-hard in general and is also NP-hard to approximate ([MJCD08]). 

Until recently, most known methods for ARMP were heuristic in nature with few known rigorous 
guarantees. The most commonly used heuristic for the problem is to assume a factorization of X 
and optimize the resulting non-convex problem by alternating minimization [Bra03, Kor08, MB07], 
alternative projections [GBOO] or alternating LMIs [SIG97]. Another common approach is to relax 
the rank constraint to a convex function such as the trace-norm or the log determinant [FHBOl], 
[FHB03]. However, most of these methods do not have any optimality guarantees. Recently, Meka 
et al. [MJCD08] proposed online learning based methods for ARMP. However, their methods can 
only guarantee at best a logarithmic approximation for the minimum rank. 

In a recent breakthrough, Recht et al. [RFP07] obtained the first nontrivial exact-recovery 
results for ARMP obtaining guaranteed rank minimization for affine transformations A that satisfy 
a restricted isometry property (RIP). Define the isometry constant of A, 6^ to be the smallest 
number such that for all X G l^^x" of rank at most k, 

{l-6k)\\X\\l<\\A{X)g<{l + 6,)\\X\\l. (1) 

Recht et al. show that for affine constraints with bounded isometry constants (specifically, 
< 1/10); finding the minimum trace- norm solution recovers the minimum rank solution. Their 
results were later extended to noisy measurements and isometry constants up to d^k < l/4\/3 by Lee 
and Bresler [LB09b]. However, even the best existing optimization algorithms for the trace-norm 
relaxation are relatively inefficient in practice and their results are hard to analyze. 

In another recent work, Lee and Bresler [LB09a] obtained exact-recovery guarantees for ARMP 
satisfying RIP using a different approach. Lee and Bresler propose an algorithm (ADMiRA) mo- 
tivated by the orthogonal matching pursuit line of work in compressed sensing, and show that for 
affine constraints with isometry constant 64^ ^ 0.04 their algorithm recovers the optimal solution. 
They also prove similar guarantees for noisy measurements and provide a geometric convergence 
rate for their algorithm. However, their method is not very efficient for large datasets and is hard 
to analyze. 

In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) based 
on the classical projected gradient algorithm. We present a simple analysis showing that SVP 
recovers the minimum rank solution for affine constraints that satisfy RIP even in the presence 
of noise and prove the following guarantees. Independent of our work, Goldfarb and Ma [GM09] 
proposed an algorithm similar to our algorithm. However, their analysis and formulation is different 
from ours. In particular, their analysis builds on the analysis of Lee and Bresler and they require 
stronger isometry assumptions, < l/\/30, than we do. In addition, we make partial progress 
on analyzing SVP for the matrix completion problem and proving exact recovery. 

Theorem 1.1. Suppose the isometry constant of A satisfies 62k ^ 1/3 and let b = A{X*) for 
a rank-k matrix X* . Then, SVP (Algorithm 1) with step-size r]t = 1/(1 + 52k) converges to X* . 
Furthermore, SVP outputs a matrix X of rank at most k such that ||^(^) — < e in at most 
_^^_l_^^logq!l iterations. 
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Theorem 1.2 (Main). Suppose the isometry constant of A satisfies 52k < 1/3 and let b = A{X*)+e 
for a rank k matrix X* and an error vector e G M'^. Then, SVP with step-size rjt = 1/(1 + 52k) 

II II 2 

outputs a matrix X of rank at most k such that \\A{X) — b\\^ < (C^ + e)-'^^, e > 0, in at most 



log(i/D) log W 



iterations for universal constants C,D. 



Our analysis of SVP is motivated by the recent work in the field of compressed sensing by 
Blumensath and Davies [BD09], Garg and Khandekar [GK09]. Our results improve the results of 
Recht et al. and Lee and Bresler as follows. 

1. SVP is considerably simpler to analyze than the methods of Recht et al. and Lee and Bresler. 
Further, we need weaker isometry assumptions on A: we only require 52k < 1/3 as opposed 
to (^5fc < 1/10 required by Recht et al., < l/4\/3 required by Lee and Bresler [LB09b] and 
5ik < 0.04 required by Lee and Bresler [LB09a]. 

2. SVP has a strong geometric convergence rate and is faster than using the best trace-norm 
optimization algorithms and the methods of Lee and Bresler by an order of magnitude. 

Although restricted isometry property is natural in settings where the afhne constraints contain 
information about all the entries of the unknown matrix, in several cases of considerable practical 
interest the affine constraints only contain local information and may not satisfy RIP directly. 

One such important problem where RIP does not hold directly is the low-rank matrix completion 
problem. In the matrix completion problem we are given the entries of an unknown low-rank matrix 
X* for ordered pairs {i,j) S C [m] x [n] and the goal is to complete the missing entries of X*. A 
highly popular application of the matrix completion problem is in the field of collaborative filtering, 
where typically the task is to predict user ratings given past ratings of the users. Recently, a lot of 
attention has been given to the problem due to the Netflix Challenge [Net]. Other applications of 
matrix completion include triangulation from incomplete data, link prediction in social networks 
etc. 

Similar to ARMP, the low-rank matrix completion is also NP-hard in general and most methods 
are heuristic in nature with no theoretical guarantees. The alternating least squares minimization 
heuristic and its variants [Kor08, MB07] perform the best in practice but are notoriously hard to 
analyze. 

Recently, Candes and Recht [CR08], Candes and Tao [CT09] and Keshavan et al. [KOM09] 
obtained the first non-trivial results for low-rank matrix completion under a few additional as- 
sumptions. Broadly, these papers give exact-recovery guarantees when the optimal solution X* 
is fi-incoherent (see Definition 4.1), and the entries O are chosen uniformly at random with 
l^^l ^ C { fi, k)n poly log n, where C{fi,k) depends only on fi,k. However, the algorithms of the 
above papers, even when using methods tailored specifically for matrix-completion such as those 
of Cai et al. [CCS08], are quite expensive in practice and not very tolerant to noise. 

As low-rank matrix completion is a special case of ARMP, we can naturally adapt our algo- 
rithm SVP for matrix completion. We demonstrate empirically that for a suitable step-size, SVP 
significantly outperforms the methods of [CR08], [CT09], [CCS08], [KOM09] in accuracy, compu- 
tational time and tolerance to noise. Furthermore, our experiments strongly suggest (see Figure 1) 
that guarantees similar to those of [CT09], [KOM09] hold for SVP, achieving exact recovery for 
incoherent matrices from an almost optimal number of entries^. 

Although we do not provide a rigorous proof of exact-recovery for SVP applied to matrix com- 
pletion, we make partial progress in this direction and give strong intuition for the performance 
of SVP. We prove that though the affine constraints defining the matrix-completion problems do 



^It follows from a coupon collector argument that exact-recovery from random samples requires nfclogn samples. 
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Figure 1: Empirical estimate of the sampling density threshold {p = \n\/mn) for exact matrix 
completion using SVP. Note that the threshold scales as Cklogn/n (with C = 1.28) almost 
matching the klogn/n lowerbound. 



not obey the restricted isometry property, they obey the restricted isometry property over inco- 
herent matrices. This weaker RIP condition along with a hypothesis bounding the incoherence of 
the iterates of SVP imply exact-recovery of a low-rank incoherent matrix from an almost optimal 
number of entries. We also provide strong empirical evidence supporting our hypothesis bounding 
the incoherence of the iterates of SVP (see Figure 2). 

We first present our algorithm SVP in Section 2 and present its analysis for affine constraints 
satisfying RIP in Section 3. In Section 4, we specialize our algorithm SVP to the task of low- 
rank matrix completion and prove a more restricted isometry property for the matrix completion 
problem. In Section 6, we give empirical results for SVP applied to ARMP and matrix-completion 
on real- world and synthetic problems. 

2 Singular Value Projection (SVP) 

Consider the following robust formulation of ARMP (RARMP), 

min il^iX) = -\\AiX) - b\\l s.t X eC{k) = {X ■.rank{X)<k}. (RARMP) 
X 2 

The hardness of the above problem mainly comes from the non-convexity of the set of low-rank 
matrices C{k). However, in spite of the hardness of the rank constraint, the Euclidean projection 
onto the non-convex set C{k) can be computed efficiently using singular value decomposition. Our 
algorithm uses this observation along with the projected gradient method for efficiently minimizing 
the objective function specified in problem (RARMP). 

Let Vk : M™^" ^mxn denote the orthogonal projection on to the set C{k). That is, Vk{X) = 
argminy{||y — : Y G C{k)}. It is well known that Vk{X) can be computed efficiently by 

computing the top k singular values and vectors of X. 

In SVP, a candidate solution to ARMP is computed iteratively by starting from the all-zero 
matrix and adapting the classical projected gradient descent update as follows (Observe that 
V^P{X) = A'^{A{X) - b)) : 

^Vk{X'- mVi^iX') )=Vk{X'- i^tA^(A{X') - 6) ) . (2) 

Algorithm 1 presents our SVP algorithm. Note that the iterates X* are always low-rank, facilitating 
faster computation of the SVD. See Section 5 for a more detailed discussion of the computational 
issues. 
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Algorithm 1 Singular Value Projection (SVP) Algorithm 



Require: A, b, tolerance e, rjt for t = 0, 1, 2, . . . 
Initialize: A° = and t = 
repeat 

yt+i ^ X* - r]tA^{A{X') - b) 
Compute top k singular vectors of y*+^: U^, S^, 



A*+i ^ UkT^uV^ 
until \\A{X'+^) - b\\l < e 



3 Analysis for AfRne Constraints Satisfying RIP 

We now show that SVP solves exact rank minimization for affine constraints that satisfy RIP and 
prove our main results, Theorems 1.1 and 1.2. We first present a lemma that bounds the error at 
the (t + l)-st iteration {il^{X^~^^)) with respect to the error incurred by the optimal solution (^(A*)) 
and the i-th iteration. 

Lemma 3.1. Let X* be an optimal solution of (RARMP) and let A* be the iterate obtained by 
SVP algorithm at t-th iteration. Then, 

V(A*+i) < V(A*) + ^^p(A* - A*)||i, 

(1 - 02k) 

where 52k is the rank 2k isometry constant of A. 

Proof. Recall that = ^ll-^(^) ~ ^lli- Since ^(•) is a quadratic function, we have 

< (^^(^(A*) - 6), A*+i _ X*) + i • (1 + 52k) ■ ||A*+i - A* III, (3) 



V'lA^+i) - V(A*) = (^(A*), A*+i - A*) + -P(A*+i - A*)||i 



where inequality (3) follows from RIP applied to the matrix A*"*"^ — A* of rank at most 2k. Let 
yt+i ^x'- ^^^(^(A*) - b) and 

ftiX) = {A^{A{X') - 6), A - A*) + i • (1 + 52k) ■ \\X - A*|||. 



Then, 



/i(A) = i(l + 52fc) 



,„2 , jA'^{A{X')-b) ^ . 



|A- A^||^p + 2( ^,A-A 

1 + 52k 



= i(l + 52k)\\X - Y'+Yf - 2(1 Is,,) • \\^^(^(^') - 

Now, by definition, VkiY^^^) = X^^^ is the minimizer of ft(X) over all matrices A G C{k) (of 
rank at most k). In particular, /((A*"^^) < ft{X*). Thus, 

V^(A*+i) - ^{X') < ft{X'+') < ft{X*) = {A^{A{X') - b),X* - A*) + 1(1 + <^2fc)||A* - A*||2, 
< (^^(^(A*) - b),X* -X') + ^- i±|^p(A* - A*)||2 (4) 
= V(A*) - V'(A*) + ^J^^\\A{X* - A*)||i, 
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where inequality (4) follows from RIP applied to X* — X^. □ 

We now prove that SVP obtains the optimal solution for ARMP with restricted isometry prop- 
erty. 

Proof of Theorem 1.1. Using Lemma 3.1 and the fact that ip{X*) = for the noise-less case, 

(l-02fc) {^-02k) 



Also, note that for 62k < 1/3, < 1- Hence, tp{X'^) < e where r 

Now, the SVP algorithm is initialized using X*^ = 0, i.e., Tp{X^) = Hence, r 



1 , V)(x°) 



l . IIW|2" 



log- 

□ 



Next, we prove the noisy version of Theorem 1.1. 

Proof of Theorem 1.2. Let the current solution X^ satisfy ip{X^) > C^||e|p/2, where C > is a 
universal constant. Using Lemma 3.1 and the fact that b — A{X*) = e, 

< M + j^^^Wb - A{X^) - e\\l 



I2 I ^"2fc / i^T^t\ T (I, Afvt\ 
(1 - hk) 



Dip{X^ 

(l-<52fc) ''^ 



where Z) = + tj^^ (l + ^)^). Recall that c^sfc < 1/3. Hence, selecting C > (1 + 52fe)/(l 



3(^. 



'2fcJ 



we get D < 1. Also, = V'(O) = \\bf/2. Hence, < (C^ + e)l|e||V2 where 



iog(i/D) (c^+iypip 



□ 



4 Matrix Completion 

We first describe the low-rank matrix completion problem formally. Let Vn : M™'^" — > K™'^^" denote 
the projection onto the index set Q. That is, {T'Q{X))ij = Xij for G $1 and {T'fi{X))ij = 
otherwise. Then, the low-rank matrix completion problem (MCP) can be formulated as follows, 

min rank(X) .s.t Vn{X) = Vn{X*), X G M"*^". (MCP) 



Observe that the matrix completion problem is a special case of ARMP. However, the affine 
constraints that define MCP, Vn, do not satisfy RIP in general. Thus Theorems 1.1, 1.2 above and 
the results of Recht et al. [RFP07] do not directly apply to MCP. The first non-trivial results for 
MCP were obtained recently by Candes and Recht [CR08], Keshavan et al. [KOM09] and Candes 
and Tao [CT09]. These works show exact recovery of the unknown matrix X* when the observed 
entries are sampled uniformly and X* is incoherent in the sense defined below. 
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Definition 4.1 (Incoherence). A matrix X G R"*^"- with singular value decomposition X = UT,V'^ 
is ^-incoherent if 

\TT \ ^ "/^ IT/ I ^ ^ 

max \Uij\ < , max v/io < 

ij \/m i,j 

Intuitively, high incoherence (i.e., [i is smah) implies that the non-zero entries of X are not 
concentrated in a small number of entries. Hence, a random sampling of the matrix should provide 
enough information to reconstruct the entire matrix. 

As matrix completion is a special case of ARMP, we can apply SVP for matrix completion. We 
apply SVP to matrix-completion with step-size rit = 1/(1 -|- 6)p, where p is the density of sampled 
entries and < < 1/3 is a parameter depending on how large p is, leading to the update 

X'^'^Vk I^X' - j^^^iVniX') -Vn{X*))y (5) 

We now provide some intuition for our choice of step-size r/t and make partial progress towards 
proving that SVP achieves exact recovery for low-rank incoherent matrices. We show that though 
the affine constraints defining MCP, Vn, do not satisfy RIP for all low-rank matrices, they satisfy 
RIP for all low-rank incoherent matrices. Thus, if the iterates appearing in SVP remain incoherent 
throughout the execution of the algorithm, then Theorem 1.1 would imply recovery of the unknown 
entries of the matrix. Empirical evidence strongly supports our hypothesis that the incoherence of 
the iterates arising in SVP remains bounded. 

Figure 1 plots the threshold sampling density p beyond which matrix completion for randomly 
generated matrices is solved exactly by SVP for fixed k and varying matrix sizes n. Note that the 
density threshold matches the optimal bound of 0{k\ogn/n) with the constant being C = 1.28. 
Figure 2 plots the maximum incoherence maxt//(X*) = ^/n maxt^ij \ U^j\, where C/* are the left 
singular vectors of the intermediate iterates X* computed by SVP. The figure clearly shows that 
the incoherence fJ-iX^) of the iterates is bounded by a constant independent of the matrix size n 
and density p throughout the execution of SVP. 

Fix an incoherent matrix X £ ]g»"xn q£ j-^^j^]^ most k and let be sampled according to 
the Bernoulli model with each (i, j) G independently with probability p. Then, ^^[[[^^^(X)!!!,] = 
Further, by Chernoff bounds, for S > 0, p > Ck'^ logn/m for a universal constant C, with 
high probability 

{l-6)p\\Xfj, < \\Vn{X)\\j, < {l + d)p\\X\\l. (6) 

Combining the above Chernoff bound estimate with a union bound over low-rank incoherent 
matrices, we obtain the following restricted isometry property for the projection operator Vn re- 
stricted to low-rank incoherent matrices. See Section 4.1 for a detailed proof. 

Theorem 4.2. There exists a constant C > such that the following holds for all < 6 < 1, 
fi > 1, n > m > 3: For il. C [m] x [n] chosen according to the Bernoulli model with density 
p > C fi'^k^ log n/ 5'^ m, with probability at least 1 — exp(— nlogn), the restricted isometry property 
in (6) holds for all fi-incoherent matrices X of rank at most k. 

Motivated by the above theorem and supported by empirical evidence (Figures 1, 2) we hy- 
pothesize that SVP achieves exact recovery from an almost optimal number of samples. 

Conjecture 4.3. Fix fj,,k and 6 < 1/3. Then, there exists a constant C such that for a ^- 
incoherent matrix X* of rank at most k and ^1 sampled from the Bernoulli model with density 
p > C/i^fc^ log n/(5^m, SVP with step-size rjt = 1/(1 -|- 6)p converges to X* with high probability. 
Moreover, SVP outputs a matrix X of rank at most k such that \\Vn{X) — 'Pn(X*)|||, < e after 
Ofj.,k ([log (7)]) iterations. 
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Figure 2: Maximum incoherence max^ ^{X^) over the iterates of SVP for varying densities p and 
sizes n of randomly generated matrices (averaged over 20 runs). Note that the incoherence is 
bounded by a constant, supporting Conjecture 4.3. 



4.1 RIP for Matrix Completion on Incoherent Matrices 

We now prove the RIP property of Theorem 4.2 for the projection operator Vq,- To prove Theorem 

4.2 we first show the theorem for a discrete cohection of matrices using Chernoff type large-deviation 
bounds and use standard quantization arguments to generalize to the continuous case. We first 
introduce some notation. 

Definition 4.4. For a matrix X € M™-^", let H-'^llmx = maxjj \Xij\ and call X a-regular if 

a 

We need Bernstein's inequality [Wik09] stated below. 

Lemma 4.5 (Bernstein's inequality). Let Xi,X2, ■ ■ ■ ,Xn be independent random variables with 
E[Xi] = 0,Vi. Furthermore, let \Xi\ < M . Then, 



Lemma 4.6. Fix an a-regular X G R'"^" and < 5 < 1. Then, for Q, C [m] x [n] chosen according 
to the Bernoulli model, with each pair {i,j) G chosen independently with probability p, 

Pr[\\\Vn{X)\\j. - P\\X\\1\ > 5p\\X\\l] < 2exp (-^^) ■ 

Proof. For {i,j) G [m] x [n], let uJij be the indicator variables with uJij = 1 if (i,j) G ft and 
otherwise. Then, uJij are independent random variables with Pr[a;jj = 1] = p. Let random variable 

Zij = uJijXfy Note that, 

E[Zij] = pXfj, Var{Z,,) = p{l - p)Xfj. 
Observe that \Zij - E{Zij]\ < < (a^/mn) • Thus, 

Q,2 

M = max\Zii -ElZiiW < \\X\\l. (7) 
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Now, define random variable 5 = Y^ij ^ij = = Note that, E[S] = 

Since, Zij are independent random variables, 

Var{S) = < p(maxX2.) • < (8) 

1 f f t f L 

Using Bernstein's inequality (Lemma 4.5) for 5 with t = 6p\\X\\'jp and Equations (7) and (8) 
we get, 

6'^pmn 

< 2 exp 



< 2 exp 



a2(l + 5/3)_ 
/ 6'^pmn\ 

□ 



We now discretize the space of low-rank incoherent matrices so as to be able to use the above 
lemma with a union bound. We need the following simple lemmas. 

Lemma 4.7. Let X G q ^-incoherent matrix of rank at most k. Then X is ^t\/k -regular. 

Proof. Let X = C/Sl/^ be the singular value decomposition of X. Then, Xij = UiT^Vj' , where 
Ui, Vj are the i'th and j'th rows of U, V respectively. Now, 

k k 

\Xij\ = \eJUJ:V^ej\ = \Y,U^lJ:llVJl\ <Y,^ii\Uii\\Vji\. 

1=1 1=1 

Since X is //-incoherent, 

ifc it / 



<y2^u\uumi\ < • (Vs.) < . ^/^. = i^.\\x\ 

^ — ' , / rnn ^ — ' / mn ^ — ' - / mn. 



1=1 " 1=1 ^ 1=1 ^ 



□ 



Lemma 4.8. Let a, b,c,x,y, z ^ [—1,1]. Then, 

\abc — xyz\ < \a — x\ + \b — y\ + \c — z\. 

The following lemma shows that the space of low-rank /u-incoherent matrices can be discretized 
into a reasonably small set of regular matrices such that every low-rank //-incoherent matrix is close 
to a matrix from the set. 

Lemma 4.9. For all < e < 1/2, > 1, m,n > 3 and k > 1, there exists a set S{fi,e) C ]^™x" 
with \S{fi,e)\ < (mnfc/e)^ such that the following holds. For any ^-incoherent X G ]R™-x" of 

rank k with \\X\\2 = 1, there exists Y G S{i-i,e) such that \\Y — X\\p < e and Y is {A^^/k) -regular. 
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Proof. We construct 5(/i, e) by discretizing the space of low-rank incoherent matrices. Let p 
e/Vdk'^mn and D{p) = {pi : i G Z, |i| < [1/pJ}. Let 

U{p) ={UG M"^'^ : G iVW^) ■ Dip) }, 



V{p) = {Ve : V,, G (7^) • Dip) }, 

= {S G : S,, = 0,i / j, G /^(/o)}, 

e) = { U^V^ : U G [/(p), S G V G ^(p) }. 

We win show that Sip, e) satisfies the conditions of the Lemma. Observe that < 2/ p. Thus, 

\Uip)\ < (2/p)-^ \Vip)\ < (2/p)"^ < 

Hence, |5(/i,e)| < (2/^)™'=+"'=+'= < (mnA:/e)3('"+")*^. 

Fix a //-incoherent X G M™^" of rank at most k with ||X||2 = 1. Let the singular value 
decomposition of X be X = UT,V'^. Let Ui be the matrix obtained by rounding entries of U to 
integer multiples of ^JJipl^/m as follows: for (f,Z) G [m] x [A;], let 



(f/l)^^ 



m 



Uii 



Now, since |f7j;| < ^/JI/^/m, it follows that Ui G f7(p). Further, for all i G [m],l G [/c]. 



Similarly, define Vi,Si by rounding entries of V,T, to integer multiples of ^ipj^fn and p respec- 
tively. Then, Vx S Vip), Si G and for (j, /) G [n] x \k\, 



\iyx),i-y,i\ < 



y/PP 



n 



< P, \i^i)ll - ^ll\ < P- 



Let Xip) = UiTiiV^. Then, by the above equations and Lemma 4.8, for i G {m],l G [k],j G [n], 

|(C/i)iKSi)z/(Fi),7 - UaJ^iiVjil < 3p. 

Thus, for i,j G [m] x [n]. 



- Xij\ = I ^(C/i)iKSi);KV^l)j7 - Uu^uVj 



1=1 

k 



1=1 

< 3kp. 



Using Lemma 4.7 and Equation (9) 



|^(p)|Ux < ll^lUx + 3kp < • \\X\\f + -4= 



imn 



'mn 



(9) 
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Also, using (9), 



\X{p) - Xfp = \X{p)ij - Xijf < dk^mnp^ = e\ 



Furthermore, using triangular inequality, > llXHi? — e > ||X||^/2. Since, e < 1 and 

^^Vk\\X\\F > 1, 

||x(p)|U<^.||x||,<i^.||x(p)||,. 

y/mn \Jmn 

Thus, Xiyp) is 4/i\/A;-regular. The lemma now follows by taking Y = X(p). □ 

We now prove Theorem 4.2 by combining Lemmas 4.6 and 4.9. 
Proof of Theorem 4-2. Let m < n, e = 5/9mnk and 

S'ifi, e) = {y : y G S{p, e),Y is 4//\/fc-regular} , 
where 5(/x, e) is as in Lemma 4.9. Then, by Lemma 4.2 and union bound, 

(T \ 3{m+n)fc / i:2 \ 

^ j exp j 

f-6'^pmn\ 
< exp(6inA;log?i) • exp I ^^^^^ I , 

where Ci > is a constant independent of m, n, k. 

Thus, if p > CfjPk^ log n/5^m, where C = 16(Ci + l), with probability at least 1 — exp(— nlogn), 
the following holds 

yYeS'{p,e), \\\Vn{Y)fF-p\\Yfp\<5p\\Yfp. (10) 

As the statement of the theorem is invariant under scaling, it is enough to show the statement for 
all /i-incoherent matrices X of rank at most k and \\X\\2 = 1. Fix such a X and suppose that (10) 
holds. Now, by Lemma 4.9 there exists Y G S'{p, e) such that ||y — X\\f < e. Moreover, 

WYfp < (W^Wf + ef < \\Xfp + 2e\\X\\F + < \\X\\l + Sek. 

Proceeding similarly, we can show that 

III^IIf - II^IIfI < 3eA:. (11) 

Further, starting with ||Pr2(^ — ^)II-F ^ 11^ ~ ^ll-F ^ ^ ^i^d arguing as above we get that 

\\\Vn{Y)\\l-\\Vn{X)\\l\<3ek. (12) 

Combining inequalities (11), (12) above, we have 

\\\Vn{X)\\l-p\\Xfj,\ < \\\Vn{X)\\l - \\Vn{Y)\\l\ +p\\\X\\l - ||y|||| + | ||p^(y)||^ _ p||y|||| 

< 6€k + 5p\\YfF from (10), (11), (12) 
<6ek + 6p{\\Xfp + 3€k) from (11) 

< 9ek + 5p\\X\\l 

< 26p\\Xfp. Since \\Xfp > 1 

The theorem now follows. □ 
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5 Computational Issues and Related Work 



The afRne rank minimization problem is a natural generalization to matrices of the following com- 
pressed sensing problem for vectors: 

min ||x||o, 

X 

s.t. Ax = b, (13) 

where ||2;||o is the /q norm (size of the support) of x E M", A E ]^™x" ig the sensing matrix and 
b E M'" are the measurements. Just as in the case of ARMP, the compressed sensing problem is 
also NP-hard in general. 

However, a number of methods have been proposed recently to solve the problem for restricted 
families of sensing matrices. Most of the methods with provable theoretical guarantees assume 
that the sensing matrix A satisfies restricted isometry properties similar to those in (1). Broadly 
speaking, existing compressed sensing approaches can be divided into three categories: 

• li relaxation: These methods relax the non-convex Iq objective function to the convex li 
objective function [CT05, CR07, Fuc05, DET06]. At a high level these results show that if 
the sensing matrix A obeys RIP or other RIP like properties, then li relaxation recovers the 
optimal sparse solution from an almost optimal 0(A;logn) measurements. 

• Basis pursuit: These methods greedily search for the subset of columns of A that would 
span the optimal solution. Specifically, in each iteration, columns of the sensing matrix that 
have the highest correlation with the current residual measurement vector are greedily added 
to the basis. Assuming RIP, basis pursuit methods also guarantee recovery of the optimal 
solution from a near optimal number of measurements [TN08, NTV08]. 

• Iterative Hard Thresholding (IHT): IHT based methods try to minimize Iq norm di- 
rectly by hard thresholding [BD09, GK09] the current candidate solution to a small support 
vector. Here again, exact-recovery guarantees are known assuming RIP. Recently, Garg and 
Khandekar [GK09] demonstrated that their GradeS method outperforms most of the existing 
compressed sensing algorithms empirically. 

As ARMP is a generalization of problem (13), it is natural to ask if the above compressed 
sensing algorithms can be generalized to solve ARMP. Interestingly, the answer is yes. Trace-norm 
relaxation approaches [RFP07] can be seen as a direct generalization of the Zi relaxation approach. 
Similarly, the ADMiRA algorithm of Lee and Bresler [LB09a] generalizes the CoSAMP algorithm of 
Tropp and Needell [TN08]. Finally, our approach is a generalization of the IHT approach. Table 1 
summarizes these three approaches and compares them in terms of a few desirable characteristics 
an algorithm for ARMP should have. 



Method 


Generalization of 


RIP constant 


Rate of Convergence 


Noisy Measurements 


Trace-norm [RFP07] 


li relaxation 


6,k < 1/10 


Not known 


No 


Trace-norm [LB09b] 


li relaxation 


63k < 1/4^3 


Not known 


Yes 


ADMiRA [LB09a] 


Basis Pursuit 


64k < 1/V32 


Geometric 


Yes 


SVP, this paper 


IHT 


S2k < 1/3 


Geometric 


Yes 



Table 1: Comparison of the existing approaches for ARMP with our SVP method 

Minimizing the trace-norm of a matrix subject to affine constraints can be cast as a semi-definite 
programming problem. However, algorithms for semi-definite programming, as used by most meth- 
ods for minimizing trace-norm, are prohibitively expensive even for moderately large datasets. 
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Recently, a variety of methods mostly based on iterative soft-thresholding have been proposed to 
solve the trace-norm minimization problem efficiently. For instance, Cai et al. [CCS08] proposed 
a Singular Value Thresholding (SVT) algorithm which is based on Uzawa's algorithm [AHU58]. A 
related approach based on linearized Bregman iteration was proposed by Ma et al. [MGC09]. Toh 
and Yun [TY09], while Ji and Ye [JY09] proposed Nesterov's projected gradient based methods for 
optimizing the trace-norm. 

While the soft-thresholding based methods for trace-norm minimization are significantly faster 
than semi-definite programming approaches they suffer from an important bottleneck: though 
the final solution to the trace-norm minimization is a low-rank matrix, the rank of the iterate in 
intermediate iterations can be large. In contrast, the rank of the iterates in our method is always 
equal to the rank of the optimal solution. 

Also, though minimizing the trace-norm approximates the low-rank solution even in the pres- 
ence of noise (see [CP09], [LB09b] for instance), noise poses considerable computational challenges 
for trace-norm optimization. Cai et al. propose a variant of SVT for handling noise that performs 
moderately well for uniformly bounded noise. However, the performance of SVT worsens consider- 
ably in the presence of outlier noise. SVP on the other hand is robust to both outlier and uniformly 
bounded noise as it minimizes the cumulative loss function ||^(X) — b\\2. 

For the case of low-rank matrix completion, Candes and Recht [CR08] obtained the first non- 
trivial results for the problem obtaining guaranteed completion for incoherent matrices X* and 
randomly sampled entries $7. Candes and Recht show that for X* /i-incoherent and chosen at 
random with \Q\ > C{fi) k'^n^''^ , trace-norm relaxation recovers the optimal solution. Building 
on the work of Candes and Recht, Candes and Tao [CT09] obtained the near-optimal bound of 
\Q\ > mm{C n^k'^nlog'^ n, C ii^knlog^ n) for exact-recovery via trace-norm minimization. However, 
the analysis of Candes and Recht, Candes and Tao is considerably complicated and minimizing 
trace-norm, even when using methods tailored for matrix-completion such as those of Cai et al. is 
relatively expensive in practice. 

For the case of matrix completion, SVT has the important property that the intermediate 
iterations of the algorithm only require computing the singular value decomposition of a sparse 
matrix. This facilitates the use of fast SVD computing package such as PRO PACK [Lar] that only 
require subroutines that compute matrix- vector products. 

Our SVP algorithm has a similar property facilitating fast computation of the update in equation 
(5); each iteration of SVP involves computing the SVD of the matrix Y = X^ + T'n{X^ — X*), where 
X^ is a matrix of rank at most k whose SVD we know and VniX^ — X*) is a sparse matrix. Thus, 
we can compute matrix- vector products of the form Yx in time 0{{m + n)k + \ 

In a different line of work, Keshavan et al. [KOM09] obtained exact-recovery from uniformly 
sampled O with \Q\ > C{fj,,k)nlogn using different techniques. The first iteration of SVP is 
similar to the first step of Keshavan et al. However, after the first iteration, Keshavan et al. use a 
sophisticated alternating minimization algorithm based on gradient descent on the Grassmannian 
manifold of low-rank matrices. However, convergence of their alternating minimization algorithm 
is slow. The simplicity of the updates in SVP makes it both easier to implement and significantly 
less computationally intensive than the alternating minimization algorithm of Keshavan et al. 

A related problem to the matrix completion problem is the problem of low-rank plus sparse de- 
composition of a matrix addressed by Chandrasekaran et al. [CSPW09] and Wright et al. [WGRM09]. 
Interestingly, Wright et al. [WGRM09] show that the low-rank matrix completion problem can be 
reduced to the low-rank plus sparse decomposition problem. Here again, their method relies on the 
trace-norm relaxation and is significantly more computationally intensive than our algorithm. 
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5.1 Selecting rank (fc) 

A drawback of our SVP method it requires rank k of the optimal solution to be known beforehand. 
For ARMP, we propose using the following heuristic: run SVP with some initial guess k and 
increment it by a fixed number (e.g, 10) until error \\AX — incurred by SVP doesn't change. 

For the matrix completion problem, in the first step of our SVP method, we compute singular 
values incrementally till we find a significant gap between singular values. Our heuristic is justified 
because: Keshavan et al. [KOM09] show that the top k (k being rank of optimal solution) singular 
values of the sampled matrix approximate the underlying matrix well, i.e., there should be a gap 
between A:-th and k + 1-th singular value. 

6 Experimental Results 

In this section, we empirically evaluate our SVP method for the affine rank minimization and low- 
rank matrix completion problems. For both problems we present empirical results on synthetic 
as well as real- world datasets. For ARMP we compare our method against the trace-norm based 
singular value thresholding (SVT) method [CCS08]. Note that although Cai et al. present the SVT 
algorithm in the context of matrix completion problem, it can be easily adapted for ARMP. For 
matrix completion we compare against SVT, ADMiRA [LB09a], the spectral matrix completion 
(SMC) method of Keshavan et al. [KOM09], and regularized alternating least squares minimization 
(ALS). We use our own implementation of ALS and SVT for ARMP, while for matrix completion 
we use the code provided by the respective authors for SVT, ADMiRA and SMC. We report results 
averaged over 20 runs. All the methods are implemented in Matlab and use mex files. 

6.1 AfRne Rank Minimization 

We first compare our method against SVT on random instances of ARMP. We generate random 
matrices X S M"^" of different sizes n and fixed rank k = 5. We then generate d = 6kn random 
affine constraint matrices Ai,l < i < d and compute b = A{X). Figure 3 (a) compares the 
computational time required by SVP and SVT (in log-scale) for achieving a relative error (||^(X) — 
^'Ib/ll&lb) of 10~^, and shows that our method requires many fewer iterations and is significantly 
faster than SVT. 

Next we evaluate our method for the problem of matrix reconstruction from random measure- 
ments. As in Recht et al. [RFP07], we use the MIT logo as the test image for reconstruction, the 
MIT logo we use is a 38 x 73 image and has rank four. For reconstruction, we generate random 
measurement matrices Ai and measure bi = Tr{AiX). Figure 3 (b) shows that our method incurs 
significantly smaller reconstruction error than SVT with lower number of iterations. 

6.2 Matrix Completion 

Next, we evaluate our method against various matrix completion methods for random low-rank 
matrices and uniform samples. We generate a random rank k matrix X G M*^^*^ and generate 
random Bernoulli samples with probability p. Figure 4 compares the time required by various 
methods (in log-scale) to obtain a root mean square error (RMSE) of 10~^ for fixed k = 2. Clearly, 
our method is substantially faster than the other methods. Next, we evaluate our method for 
increasing k. Figure 5 compares the time required by various methods to obtain a root mean 
square error (RMSE) of 10^^ for fixed n = 1000 and increasing k. Note that our algorithm scales 
well with increasing k and is much faster than the other methods. 
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ARMP: Random Instances 
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Figure 3: (a): Time taken by SVP and SVT for random instances of AfRne Rank Minimization 
Problem (ARMP) with optimal rank k = 5, (b): Reconstruction error for the MIT logo 
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Figure 4: Running time (on log scale) for various methods for matrix completion problem with 
sampling density p = .1 and optimal rank k = 2. 
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Figure 5: Running time (on log scale) for various methods for matrix completion problem with 
sampling density p = .1 and n = 1000. 



Finally, we study the behavior of our method in presence of noise. For this experiment, we 
generate random matrices of different size and add approximately 5% Gaussian noise. Figure 6 
plots error incurred and time required by various methods as n increases from 1000 to 5000. Note 
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n (Size of Matrix) n (Size of IVIatrix) 

Figure 6: RMSE and time required by various methods for matrix completion with p = .1, k = 2 
and around 10% of the known entries are corrupted. Note that in terms of RMSE values, SVP, 
ALS and SMC perform about the same. 



that SVT is particularly sensitive to noise and incurs high RMSE. 

Matrix Completion: Movie-Lens Dataset 

Finally, we evaluate our method on the Movie-Lens dataset [Mov], which contains 1 million ratings 
for 3900 movies by 6040 users. For SVP and ALS, we fix the rank of the matrix to be A; = 15. 
For SVP, we set the step size r]t to be hl\ft. SVP incurs RMSE of 1.01 in 64.85 seconds, while 
SVT incurs RMSE of 1.21 in 1214.78 seconds. In contrast, ALS achieves RMSE of 0.90 in 195.34 
seconds. We attribute the relatively poor performance of SVP and SVT as compared with ALS to 
the fact that the ratings matrix is not sampled uniformly, thus violating a crucial assumption of 
both our method and SVT. Similar to Figure 6 (b), SVT converges much slower than SVP on the 
Movie-Lens data. 



7 Conclusion and Future Work 

There has been a significant amount of work recently in the area of low-rank approximations. Ex- 
amples include minimizing rank subject to affine constraints, low-rank matrix completion, low-rank 
plus sparse decomposition. Most of this research, with the exception of Keshavan et al. [KOM09], 
relies on relaxing the rank constraint with trace-norm and gives guarantees for recovering the opti- 
mal solution under certain additional assumptions. However, trace-norm relaxation based methods 
are typically hard to analyze and are relatively expensive in practice. 

In this paper, we proposed a simple and natural algorithm based on iterative hard-thresholding. 
We give a simple analysis of our algorithm for the affine rank minimization problem satisfying the 
restricted isometry property and give geometric convergence guarantees even in the presence of 
noise. The intermediate steps in our algorithm are less computationally demanding than those 
of current state-of-the-art methods. We empirically demonstrate that our method is significantly 
faster and more robust to both uniformly bounded and outlier noise than most existing methods. 

An immediate question arising out of our work is to prove our hypothesis bounding the incoher- 
ence of the iterates of SVP for low-rank matrix completion, or otherwise directly prove Conjecture 
4.3. Other directions include application of our methods to other problems of similar fiavor such 
as the low-rank plus sparse matrix decomposition [CSPW09], or other matrix completion type 
problems like minimum dimensionality embedding using partial distance observations [FIIB03] and 
low-rank kernel learning [MJCD08]. 
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